C/3 



> 



X 



Dynamic generalized linear models for non-Gaussian time series 

forecasting 



00 

O ■ K. Triantafyllopoulos* 

O 
(N 



February 5, 2008 



Abstract 



. The purpose of this paper is to provide a discussion, with illustrating examples, on 

Bayesian forecasting for dynamic generalized linear models (DGLMs). Adopting approx- 
imate Bayesian analysis, based on conjugate forms and on Bayes linear estimation, we 
describe the theoretical framework and then we provide detailed examples of response dis- 
tributions, including binomial, Poisson, negative binomial, geometric, normal, log-normal, 
gamma, exponential, WeibuU, Pareto, beta, and inverse Gaussian. We give numerical il- 
lustrations for all distributions (except for the normal). Putting together all the above 
distributions, we give a unified Bayesian approach to non-Gaussian time scries analy- 
sis, with applications from finance and medicine to biology and the behavioural sciences. 
Throughout the models we discuss Bayesian forecasting and, for each model, we derive 
^SJ ■ the multi-step forecast mean. Finally, we describe model assessment using the likelihood 

' function, and Bayesian model monitoring. 

. Some key words: Bayesian forecasting, non-Gaussian time series, dynamic generalized 

' linear model, state space, Kalman filter. 
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1 Introduction 



In the past three decades non-Gaussian time series have attracted a lot of interest, see e.g. 
; Cox (1981), Kaufmann (1987), Kitagawa (1987), Shephard and Pitt (1997), and Durbin 

and Koopman (2000), among others. In the context of regression modelling, generalized 
linear models (McCullagh and Nelder, 1989; Dobson, 2002) offer a solid theoretical basis for 
statistical analysis of independent non-normal data. A general framework for dealing with 
time series data is the dynamic generalized linear model (DGLM), which considers generalized 
linear modelling with time-varying parameters and hence it is capable to model time series 
data for a wide range of response distributions. DGLMs have been widely adopted for non- 
normal time series data, see e.g. West et al. (1985), Gamerman and West (1987), Fahrmeir 
(1987), Friihwirth-Schnatter, S. (1994), Lindsey and Lambert (1995), Chiogna and Gaetan 
(2002), Hemming and Shaw (2002), Godolphin and Triantafyllopoulos (2006), and Gamerman 
(1991, 1998). Dynamic generalized linear models are reported in detail in the monographs 
of West and Harrison (1997, Chapter 14), Fahrmeir and Tutz (2001, Chapter 8), and Kedem 
and Fokianos (2002, Chapter 6). 
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In this paper we propose a unified treatment of DGLMs that includes approximate Bayesian 
inference and multi-step forecasting. In this to end we adopt the estimation approach of West 
et al. (1985), but we extend it as far as model diagnostics and forecasting are concerned. In 
particular, we discuss likelihood-based model assessment as well as Bayesian model mon- 
itoring. In the literature, discussion on the DGLMs is usually restricted to the binomial 
and the Poisson models, see e.g. Fahrmeir and Tutz (2001, Chapter 8). Even for these re- 
sponse distributions, discussion is limited on estimation, while forecasting and in particular 
multi-step forecasting does not appear to have received much attention. We provide detailed 
examples of many distributions, including binomial, Poisson, negative binomial, geometric, 
normal, log-normal, gamma, exponential, Weibull, Pareto, two special cases of the beta, and 
inverse Gaussian. We give numerical illustrations for all distributions, except for the normal 
(for which one can find numerous illustrations in the time series literature) using real and 
simulated data. 

The paper is organized as follows. In Section [2] we discuss Bayesian inference of DGLMs. 
Section [3] commences by considering several examples, where the response time series follows 
a particular distribution. Section H] gives concluding comments. The appendix includes some 
proofs of arguments in Section [3l 



2 Dynamic generalized linear models 
2.1 Model definition 

Suppose that the time series {yt} is generated from a probability distribution, which is a 
member of the exponential family of distributions, that is 

PiVtht) = exp (^^^ {z{ytht - b{jt))^ c{yt, (fit), (1) 

where 7^, known as the natural parameter, is the parameter of interest and other parameters 
that can be linked to c^j, a(.), h{.) and c(., .) are usually referred to as nuisance parameters or 
hyperparameters. The functions a(.), b{.) and c(., .) are assumed known, (j)t,a{(j)t),c{yt,4't) > 
0, 6(7t) is twice differentiable and according to Dobson (2002, §3.3) 

^{z{yt)\lt) = and Yar {z{yt)\-ft) = ^ . 

The function z[.) is usually a simple function in yt and in many cases it is the identity function; 
an exception of this is the binomial distribution. If z{yt) = yt, distribution ([T|) is said to be in 
the canonical or standard form. Dobson (2002, §3.3) gives expressions of the score statistics 
and the information matrix, although the consideration of these may not be necessary for 
Bayesian inference. 

The idea of generalized linear modelling is to use a non-linear function g{-), which maps 
^t = IE(yj|7t) to the linear predictor rjt, this function is known as link function. If g{iit) = 7t; 
this is referred to as canonical link, but other links may be more useful in applications (see 
e.g. the inverse Gaussian example in Section [3. 2p . In GLM theory, r]t is modelled as a linear 
model, but in DGLM theory, the linear predictor is replaced by a state space model, i.e. 

g{Ht) = m = Ft^t and 0t = GtOt-i + wt, 
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where Ft is a d x 1 design vector, Gt is a d x d evolution matrix, 0^ is a d x 1 random vector 
and LOt is an innovation vector, with zero mean and some known covariance matrix 0^. It is 
assumed that LOt is uncorrelated of tOs (for t ^ s) and ujt is uncorrelated of for all ^- It is 
obvious that if one sets Gt = Ip (the d x d identity matrix) and cvt = (i.e. its covariance 
matrix is the zero matrix), then the above model is reduced to a usual GLM. 

For the examples of Section [3] we consider simple state space models, which assume that 
Ft = F, Gt = G, i^t = ^ are time-invariant. However, in the next sections, we present 
Bayesian inference and forecasting for time- varying Ft, Gt, ^t in order to cover the general 
situation. 



2.2 Bayesian inference 



Suppose that we have data yi, . . . ,yT and we form the information set = {yi, . . . , yt}, for 
t = 1, . . . ,T. At time t — lwe assume that the posterior mean vector and covariance matrix of 
6t-i are nit-i and Pt-i, respectively, and we write 9t-i\y^~^ ~ {nit-i, Pt-i). Then from 9t = 
GtOt~i + (j-'t, it follows that 6t\y*~^ ~ {ht, Rt), where ht = Gtmt-i and Rt = GtPt~iG[ + Qf 
The next step is to form the prior mean and variance of ijt and 6t, that is 
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where ft = F^ht and qt = F^RtFt. The quantities ft and qt are the forecast mean and variance 
of r]t. 

In order to proceed with Bayesian inference, we assume the conjugate prior of jt, so that 



P{lt\y ) = i^in, St) exp(n7t - stb{jt)), 



(3) 



for some known rt and sj. These parameters can be found from g{fit) = Vt and ft = K{r]t\y^ 
qt = Var(?yt|?/*~^), which are known from ([2]). The normalizing constant k{.,.) can be found 

by 



Kin, St) 



exp(rt7t - stb{'yt)) djt 



where the integral is Lebesque integral, so that it includes summation / integration of discrete 
/ continuous variables. We note that in most of the cases, the above distribution will be 
recognizable (e.g. gamma, beta, normal) and so there is no need of evaluating the above 
integral. One example that this is not the case is the inverse Gaussian distribution (see 
Section [3. 2p . 

Then observing yt, the posterior distribution of 74 is 



p{it\y^ 



p{yt\it)p{it\y 



! p{yt\it)p{ntW '^)dit 



1 



, , z{yt) 



a((/)t)' a{4>t) 



exp [\rt + 



zjyt) 

a{4>t] 



It 



St + 



a{<Pt) 



Kit) ■ (4) 



In many situations we are interested in parameters that are given as functions of jt- In 
such cases we derive the prior /posterior distributions of 7t as above and then we apply a 
transformation to obtain the prior /posterior distribution of the parameter in interest. The 
examples of Section [3] are illuminative. 
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Finally, the posterior mean vector and covariance matrix of 6t are approximately given by 

0t\y' ^imt,Pt), (5) 

with 

mt = ht+ RtFtifl - ft)/qt and Pt = Rt - RtFtFlRt{l - q*t/qt)/qt, 

where = ¥.{rjt\y*) and q^ = E(r/t|y*) can be found from g{fj,t) = and the posterior (jj]). 
The priors ([2|), ([3]) and the posteriors ([4]), ([5]) provide an algorithm for estimation, for any 
t = 1, . . . , T. For a proof of the above algorithm the reader is referred to West et al. (1985). 

An alternative approach for the specification of rt and st is to make use of power dis- 
counting and this is briefly discussed next. The idea of power discounting stem in the work of 
Smith (1979); power discounting is a method of obtaining the prior distribution at time t + 1, 
from the posterior distribution at time t. Here we consider a minor extension of the method 
by replacing t + 1 by t + for some positive integer L Then, according to the principle of 
power discounting, the prior distribution at time t + i Is proportional to {p{'^t\y^)Y ■, where 5 
is a discount factor. Thus we write 

p{it+i\y') ^ {p{it)\y')^ , for 0<5<i. 

This ensures that the prior distribution of "ft+i is flatter than the posterior distribution of 7^. 
The above procedure assumes that rt{t) = rt+i and st{t} = s^+i, which implicitly assumes a 
random walk type evolution of the posterior /prior updating, in the sense that Bayes decisions 
in the interval (t, t + t) remain constant, while the respective expected loss (under step loss 
functions) increase (Smith, 1979). 



2.3 Bayesian forecasting and model assessment 

Suppose that the time series {yt} is generated by density ([I]) and let be the information 
set up to time t. Then the £-step forecast distribution of yt+i is 



where rt(i) and Sf{i) are evaluated from ft{i) and qt{i), the mean and variance of rjt+ily^, 
and the distribution of 7t+^|y*, which takes a similar form as the distribution of 7t|y*~^. 

Model assessment can be done via the likelihood function, residual analysis, and Bayesian 
model comparison, e.g. based on Bayes factors. The likelihood function of 71, . . . ,7T) based 
on information y'^ is 

T 

L(7i,... ,7T;y^) = Y{p{ytht)p{7tht^i), 
t=i 

where the first probability in the product is the distribution ([T|) and the second indicates the 
evolution of 7t, given ^t-i- Then the log-likelihood function is 

^ / 1 \ ^ 

^(71, • • • ,7T;y^) = ^ ( —rpriziytht - b{-ft)) + log c{yt,(pt) j + ^ logp(7f |7f_i). (7) 

The likelihood function can be used as a means of model comparison (for example looking at 
two model specifications, which differ in some quantitative parts, we choose the model that 
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has larger likelihood). For model assessment the likelihood function can be used in order to 
choose some hyperparameters (discount factors, or nuisance parameters) so that the likelihood 
function is maximized in terms of these hyperparameters. The evaluation of d?]) requires the 
distribution p{'yt\lt-i)- This depends on the state space model for r/j used. In the examples 
of Section [3] we look at these probabilities, based mainly on Gaussian random walk evolutions 
for rjt, but also we consider a linear trend model for r]t- Note that the consideration of uot 
following a Gaussian distribution does not imply that 6t\y^ follows a Gaussian distribution 
too, since the distribution of may not be Gaussian. 

For the sequential calculation of the Bayes factors (which for Gaussian responses are 
discussed in Salvador and Gargallo, 2005), a typical setting suggests the formation of two 
models Mi and ^A2, which differ in some quantitative aspects, e.g. some hyperparameters. 
Then, the cumulative Bayes factor of Aii against M.2 is defined by 



p[yt,---,yt-k+i\y* '^,M2) t+i 



where Hi{l) = Ht{0) = 1, for all t, and p{yt, ■ ■ ■ ,yt-k+i\y , Mj) denotes the joint distribu- 
tion oi yt, . . . , given ?/*~^, for some integer < k < t and j = 1, 2. Then preference 
of model 1 would imply larger forecast distribution of this model (or Ht{k) > 1); likewise 
preference of model 2 would imply Ht{k) < 1; Ht{k) = 1 implies that the two models are 
probabilistically equivalent in the sense they provide the same forecast distributions. 

3 Examples 

3.1 Discrete distributions for the response yt 
3.1.1 Binomial 

The binomial distribution (Johnson et al, 2005) is perhaps the most popular discrete distri- 
bution. It is typically generated as the sum of independent success/failure bernoulli trials and 
in the context of generalized linear modelling is associated with logistic regression (Dobson, 
2002). 

Consider a discrete- valued time series {yt}, which, for a given probability vr^, follows the 
binomial distribution 

p(ytKt) = (^^I^Trf (l-7r0"^-^% yt = 0,1,2,..., nt; nt = 1,2,...; < < 1, 

where (""') denotes the binomial coefficient. It is easy to verify that the above distribution 

is of the form ^ with z{yt) = yt/nt, jt = log7rt/(l - irt), a{4>t) = (jy^^ = n^"^ , b{^t) = 
log(l -|-exp(7t)), and 0(^1, (pt) = (^')- The logarithmic, known also as logit, link r]t = g{fJ.t) = 
jt = log'7rf/(l — TTt) maps irt to the linear predictor r]t, which with the setting rjt = F'6t and 
Ot = GOt^i + ujt, generates the dynamic evolution of the model. 

The prior of 7rt|y*~^, follows by the prior of 7t|y*~^ and the transformation 7^ = log7rt/(l — 
TTt) as beta distribution ■Kt\y^~^ ~ B{rt, st — rt), with density 
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where r(.) denotes the gamma function and St > rt > 0. Then, observing yt, the posterior of 
TTtly* is TTtly* ~ B{rt + yt, st + nt - n - yt). 

In the appendix it is shown that, with ft and qt the prior mean and variance of rjt, an 
approximation of rt and st is given by 

l + exp(/,) 2 + exp(/,) + exp(-/,) 
n = and St = . (9) 

qt qt 

In order to proceed with the posterior moments of 9t\y^ as in ([S]), we can see that 

dip{x) 



ft = i^in + yt) - il^ist -n + nt- yt) and ql ^ ^^^^^ 



dx 



x=rt+yt 



x=st+nt—yt 



where '0(.) denotes the digamma function (see the Poisson example and the appendix). In 
the appendix approximations of Tp{.) and of its first derivative (also known as trigamma 
function) are given. These definitions as well as the parameters of the beta prior are slightly 
different from the ones obtained by West and Harrison (1997), as these authors use a different 
parameterization, which does not appear to be consistent with the prior /posterior updating. 
Given information i/*, the £-step forecast distribution is obtained by first noting that 

7:t+i\y' ^ B(rt{l),st[l) - rt{l)), (10) 

where rt{tj and st{tj are given by rt and sj, if ft and qt are replaced by ft{t) = E(7y(4.£|y*) 
and qt{t) = Var(ryt+£|y*), which are calculated routinely by the Kalman filter (see Section [2]). 
Then the £-step forecast distribution is given by 

, t^ ^{stU)) 

p{yt+i\y') - ^' 



T{rtii))T{st{i) - rt{i))Tist{i) + nt+e) 

x^h^')T{rtii) + yt+e)ristii) - rt{i) + nt+i - yt+i). 
nt+e. \yt+ej 

We can use conditional expectations in order to calculate the forecast mean and variance, i.e. 

yt{l) = nyt^e\y') = nnyt~,i\^t+e)\y') = ^'^^f'^^^^^^^ 



and 



Yai{yt+eW') = ^{\ar{yt+e\n+e)\y^) + ^ai{¥.{yt+e\TTt+e)) 

nt+iirtil) + 1) nt+eirtil) + l)(n(£) + 2) 



n(^) + st{€) + 1 {rt{l) + st{l) + \){rt{l) + stii) + 2) 



n\^Art{t) + \)st{i) 



(rt(£) + st(£) + l)2(rt(^) + st(^) + 2) 



For the specification of rt and st, we can alternatively use power discounting (see Section 
ED. This yields 

rt+\ = Jrj + (5?/t + 1 - 5 and st+\ = 5st + (^n* + 2 - 5, 
where 5 is a discount factor and ro, sq are initially given. 
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For the evolution of i]t via 9t, the obvious setting is the random walk, which sets r]t = 9t = 
Ot-i + wj. From the logit link we have vr(/(l — vr^) = exp(0t) and so the evolution of Ot yields 



exp(wf)7rt- 



1 - 7rj_i + exp(wj)7rt„i ' 



which gives the evolution of vrf, given irt-i, as a function of the Gaussian shock uot- Then the 
distribution oi iTtWt-i is 



p[TTt\T^t-l) = r-— exp -:-=r log — 

V27rr27r((l - VTf) \ III \ 7r(„i(l - vtj j ^ 
and so from ([7]) the log-likelihood function is 

^{^Tl,...,■KT■,y^) = ^ f yt logvTt - ytlog(l - TTt) +njlog(l - TTt) +log 
t=l ^ 

- log V^Ml - vr.) - ^ (log ^ 

The Bayes factors are easily computed from ([8]) and the forecast distribution p{yt+i\y^). 
If we use a linear trend evolution on 9t, we can specify 



Vt = [1 0] 



^21 



and 



ht 



1 1 

1 



+ 



^2t 



Here = [On 621]' is a 2-dimensional random vector and cj^ = [ujit u!2t\' follows a bivari- 
ate normal distribution with zero mean vector and some known covariance matrix. Then, 
conditional on TTt-i, from the logit link function we can recover the relationship of vr^ as 



exp(6'2,o + Ya=i + c^if )7rt^i 

1 - TTt^l + exp{02fi + YlUi UJ2i + wit)7rt_i 



To illustrate the binomial model, we consider the data of Godolphin and Triantafyllopoulos 
(2006), consisting of quarterly binomial data over a period of 11 years. In each quarter nt = 25 
Bernoulli trials are performed and yt, the number of successes, is recorded. The data, which 
are plotted in Figure [H show a clear seasonality and therefore, modelling this data with GLMs 
is inappropriate. The data exhibit a trend/periodic pattern, which can be modelled with a 
DGLM, by setting r}t = F'Ot and Ot = GOt~i + uJt, where the design vector F has dimension 
5x1 and the 5x5 evolution matrix G comprises a linear trend component and a seasonal 
component. One way to do this is by applying the trend / full harmonic state space model 



and G 



110 

10 

cos(7r/2) sin(7r/2) 

-sin(7r/2) cos(7r/2) 

-1 



where G is a block diagonal matrix, comprising the linear trend component and the seasonal 
component, for the latter of which, with a cycle of c = 4, we have h = c/2 = 2 harmonics 
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Figure 1: Binomial data of 25 Bernoulli trials (solid line) and one-step forecast mean (dashed 
line) . 

and the frequencies are uj = 27r/4 = 7r/2 for harmonic 1 and u; = 47r/4 = vr for harmonic 
2 (the Nyquist frequency). Similar models, with Gaussian responses, are described in West 
and Harrison (1997), and Harvey (2004). The covariance matrix O of uJt is set as the block 
diagonal matrix Q = block diag(r2i, r22)) where fii = IOOO/2 corresponds to the linear trend 
component, ^2 = IOO/3 corresponds to the seasonal component and it is chosen so that 
the trend has more variability than the seasonal component (West and Harrison, 1997). The 
priors rriQ and Pq are set as mo = [0 0]' and Pq = IOOO/5, suggesting a weakly informative 
prior specification. Figure [1] plots the one-step forecast mean of {yt} against {yt}- We see 
that the forecasts fit the data very closely proposing a good model fit. 

3.1.2 Poisson 

In the context of generalized linear models, the Poisson distribution (Johnson et al, 2005) is 
associated with modelling count data (Dobson, 2002). In a time series setting count data are 
developed as in Jung et al. (2006). 

Suppose that {yt} is a count time series, so that, for a positive real-valued Aj > 0, yt\Xt 
follows the Poisson distribution, with density 

p(2/t|At) = exp(-At)^, yj = 0,1,2,... ; At > 0, 
where ytl denotes the factorial of yt- 



We can easily verify that this density is of the form ([T|), with z{yt) = yt, a{(pt) = 
jt = log At, 6(7t) = exp(7t), and c{yt,(t)t) = l/ut^-- We can see that E(yt|A) = db{-ft) / d'jt = 
exp(7t) = Xt and Var(?/j|At) = d'^b{-ft)/ 7? = exp(7t) = Aj. 

From the prior of 7i and the transformation 7^ = log Xt, we obtain the prior of Xt\y^~^ 
as a gamma distribution, i.e. Xt\y^~^ ~ G{rt,st), with density 

p{k\y'-') = 7^,y'-'eM-stXt), 

for rt, St > 0. Then it follows that the posterior of Xt is the gamma G{rt + yt, st + 1). 

For the definition of rt and st we use the logarithmic link g{Xt) = log At = i]t = F'Ot or 
At = exp(F'6t). Based on an evaluation of the mean and variance of log At and a numerical 
approximation of the digamma function (see appendix) , we can see 

1 , exp(-/t) 

rt = — and St = , (11) 

Qt qt 

where ft and qt are the mean and variance of r]t. 

For the computation of ft and Qt, the posterior mean and variance of 7t, first define the 
digamma function '0(.) as tp{x) = d\ogT{x)/ dx, where r(.) denotes the gamma function and 
of course x > 0. Then we have 



/t* = ^(rt + yt)-log(st + l) and g,* ^ "'^^'^^ 



dx 



x=rt+yt 



which can be computed by the recursions il^{x) = Tp{x + 1) — x"^ and d'il){x)/ dx = dip{x + 
1)/ dx + x~'^. Using the approximations ^^{x) = \ogx + {2x)~^ and dijj{x)/ dx = x~^(l — 
(2x)~^), we can write 

« , n + yt , 1 , , 2rt + 2yt-l 
ft ~ log — - + — ■ and Qt ^ — ■ r^. 

st + l 2{rt + yt) 2{rt + ytY 

With rt, St, ft and Qt we can compute the first two moments of 9t\y^ as in ([5]). For a detailed 
discussion on digamma functions the reader is referred to Abramowitz and Stegun (1964, 
§6.3). 

Defining rt{£) and St{i) according to ft{i) and qt{(^) and equation pT|) . the £-step forecast 
distribution of yt+e\y*^ is given by 



p{yt+i\y^) 



rt{i) + yt+i-l\ ( stii) V'^'^ f 1 Y^' 



yt+e J\l + st{i)J \l + st{e) 



which is a negative binomial distribution. The forecast mean and variance can be calculated 
by using conditional expectations, i.e. 

ytii) = E{yt+e\y') = E(E{yt+Mt+i)\y') = ^ 

and 

rt(^)(st(^) + l) 



Var(2/t+^|y*) = E(Var(yt+£|At+^)|2/*) + Var(E(yt+^|At+£)|?/ 



(stm 
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The power discounting yields 



rt+i = 6{rt + yt) + I - 5 and st+i = 6{st + I). 



Considering the random walk evolution for 6t so that rjt = Ot = Ot-i + uJt, where LUt ~ 
A^(0, 17), for some variance 0, we can see that 



since log At = r/t = 9t- Then from the normal distribution of tot, the distribution of Xt\Xt-i is 



which is a log-normal distribution (see Section [3. 2p . Tnen from ([7]) the log-likelihood function 
is 



Bayes factors can be calculated using ([8]) and the negative binomial one-step ahead forecast 
probability functions p{yt+i\y^). 

In order to illustrate the Poisson model we consider US annual immigration data, in the 
period of 1820 to 1960. The data, which are described in Kendall and Ord (1990, page 13), are 
shown in Figure [2l The nature of the data fits to the assumption of a Poisson distribution, 
but it can be argued that, after applying a suitable transformation, some Gaussian time 
series model can be appropriate. The data are non-stationary and a visual inspection shows 
that they exhibit a local level behaviour. One simple model to consider is the random walk 
evolution of ijt = 6t as described above. We use power discounting with 5 = 0.5, which is 
a low discount factor capable to capture the peak values of the data. Figure [2] shows the 
one-step forecast mean against the actual data and as we see the forecasts capture well the 
immigration data. 

3.1.3 Negative binomial and geometric 

The negative binomial distribution (Johnson et al, 2005) arises in many practical situations 
and it can be generated via independent Bernoulli trails or via the Poisson/gamma mixture. 
In time series analysis, an application of negative binomial responses is given in Houseman 
et al. (2006). We note that the negative binomial distribution includes the geometric as a 
special case (see below). 

Suppose that the time series {yt} is generated from the negative binomial distribution, 
with probability function 



where ttj is the probability of success and nt is the number of successes. This distribution 
belongs to the exponential family ([T]), with z{yt) = yt, a(</'t) = 4>t = ^, 7t = log(l — vtj). 



Kit) = -nt^ogil - exp(7t)), and c{yt,(t)t) = i^^n^li^)- Then it follows that ¥.{yt\-Kt) = 



Xt = exp(u;t)At_i 
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Figure 2: US annual immigration in thousands (solid line) and one-step forecast mean (dashed 
line) . 



db{^t)/ djt = nt{l - TTt)/iTt and Var(yt|7rf) = (fb{jt)/ d^"^ = nt{l - vrf)/7r|. We note that by 
setting rit = 1 and xt = yt — the time series xt follows a geometric distribution and thus 
all what follows applies readily to the geometric distribution too. 

By using the prior of 7t|y*~^ and the transformation 7^ = log(l — VTf), the prior of 7^t\y^~^ 
is the beta distribution 7rt\y^^^ B{ntSt + 1, rt) and the posterior of ■Kt\y* is the beta vr^ly* ~ 
B{ntSt + nt + 1, + yt). Using the logit link, as in the binomial example, the definitions of 
Tt and St are 

1 + exp(-/t) 1 + exp(/t) - qt 



rt 



and St 



and the posterior moments ft and ql are 



ntqt 



ft = tpintst + nt + 1) - ^(rt + yt) and qt 



dip{x) 



dx 



+ 



d'ip{x) 



x=ntst+nt+l 



dx 



x=rt+yt 



which can be approximated by 



1 ntst + nt + l , 
ft ~log : + 



1 



n + yt 2{ntst+nt + l) 2{rt + yt) 



and 



Qt 



2ntst + 2nt + l , 2rt + 2yt-l 



+ 



2{ntst + nt + 1)2 2{rt + yt 



\2 ■ 
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Thus we can compute the moments of 0t\y^ as in ([5]) and so we obtain an approximation of 
the quantities rt{i) and st{i), as functions of ft{£) and qt{(^)- 
The ^-step forecast distribution is given by 

. I r(rf (£) + nt+e + stji) + l)r(rt(£) + yt+e)T{nt+eSt{e) + ut+j + 1) fyt+e + rit+e - 1 
P[yt+i\y ) T{rt{i))T{nt+M^) + l)r(rt(^) + + nt+,st(£) + rit+i + I) \ Ut+e - 1 
The forecast mean and variance of y^^i are given by 

yt{i) = E{yt+e\y') = E{E{yt+e\7Tt+i)\y') = ^ 

and 

YaTiyt+e\y') = E(Var(yt+,|At+,)|y*) + Var(E(yt+,|At+,)|y*) 



st{i){nt+est{e) - I) nj^Mi?' 

The power discounting yields 

A/ , i\ , 1 A 6{ntst + nt) 

n+i = d{rt +yt-l) + 1 and St+i = , 

nt+i 

where as usual 6 is a discount factor. 

Considering the random walk evolution for rjt = 9t = Ot-i the link lognt(l — 7rt)/n( = 
rjt^ yields the evolution for vr^ 

= (12) 

7rt_i + exp(a;t) - iTt-i exp(a;t) 

Given that uJt ~ -^(0, il), for a known variance Jl, the distribution of 7rf|7r(_i is 



p{TTt\'Kt~i) = r—— -exp -— log 



1 / ■Kt-l{l-T^t] 



and so from ([7]) the log-likelihood function is 



2^ 



yt + nt-l 
nt-l 



^(VTI , . . . , TTt; 2/'^) = ^ f log(l - TTt ) + ?lf log TTt + log 

t=i ^ 

-logV2^..(l-..)--L(log -^-j(^---j ) 

Bayes factors can be computed using ([8]) and the predictive distribution p{yt+i\y^). 

To illustrate the above model we have simulated 100 observations from the above model; 
we simulate one draw from ttq ~ -6(2, 1) so that E(7ro) = 2, we simulate 100 innovations 
wi, . . . , wioo from a A^(0, 1), then using (fT2]) we generate vri, . . . , vrioo and finally, for each time 
t, we simulate one draw from a negative binomial with parameters = n = 10 and tt^. Figure 
[3] shows the simulated data (solid line) together with the one-step ahead forecast means rt/st- 
For the fit, we pretend we did not have knowledge of the simulation process and so we have 
specified F = [I 0]', G = n = h (the 2x2 identity matrix), mo = [0 0]', and Pq = IOOO/2, 
the last indicating a weakly informative prior specification (i.e. Pq^ ~ 0). We observe that 
the forecasts follow the data closely indicating a good fit. We have found that as it is well 
known for Gaussian time series, these prior settings are insensitive to forecasts, since prior 
information is deflated with time. 
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Figure 3: Negative binomial simulated data (solid line) and one-step forecast mean (dashed 
line) . 



3.2 Continuous distributions for the response yt 
3.2.1 Normal 

Normal or Gaussian time series are discussed extensively in the literature, see e.g. West and 
Harrison (1997) for a Bayesian treatment of Gaussian state-space models. Here we discuss 
Gaussian responses in the DGLM setting, for completeness purposes, but also because the 
normal distribution has many similarities with the log-normal distribution that follows. 

Suppose that {yt} is a time series generated from a normal distribution, i.e. ytlfJ^t ~ 
N{iJ,t,V), with density 



exp 



{yt - fj-tf 

2V 



-oo < yt,fit < oo; V > 0, 



where /it is the level of yt- The variance V of the process can be time- varying, but for simplicity 
here, we assume it time-invariant. Here, this variance is assumed known, while fit is assumed 
unknown. If V is unknown, Bayesian inference is possible by assuming that 1/V follows a 
gamma distribution and this model leads to a conjugate analysis (resulting to a posterior 
gamma distribution for 1/V and to a Student t distribution for the forecast distribution of 
yt+i)- This model is examined in detail in West and Harrison (1997, Chapter 4). Returning to 
the above normal density, we can easily see that p{yt\f^t) is of the form of ([T]), with z{yt) = yt-, 
a(0t) = <^r' = ynt = ^^t, K-it) = 7tV2 and c(yt, <At) = {2W)-^/^ exp(-y2/(2F). 

The prior for /it|?/*~^ is the normal distribution fit\y^~^ ~ N{rtS^ ^, Sj ^) and the posterior 
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of /Ut|y* is the normal distribution 

'rt + V"^yt 1 



The hnk function is the identity hnk, i.e. g{fJ-t) = fJ-t and so we have fit = Vt = F'9t, which 
imphes rt = ft/qt and st = 1/qt- By replacing these quantities in the above prior and posterior 
densities, we can verify the Kalman filter recursions. 

It turns out that the £-step forecast distribution is also a normal distribution, i.e. 



The power discounting yields 

rt+i=6\rt + V~^yt) and st+i = d\st + V~'). 

Adopting the random walk evolution for 9t = Ot-i +uJt, from the identity link fit = ilt = ^t; 
we have that ^tlf^t-i ~ ^^{l-H-i, where uJt ~ -^(0, 0,). From ([7]) the log-likelihood function 
is 

rp 

...,fir-y) = Y: (^i^ytf^t - ^l) - log ^/i^ - ^ - ^^^S^) • 

t=i ^ ^ 

Bayes factors can be easily computed from the forecast density p{yt+i\y^) and the Bayes factor 
formula dS]). 

3.2.2 Log-normal 

The log-normal distribution has many applications, e.g. in statistics (Johnson et al, 1994), 
in economics (Aitchison and Brown, 1957), and in life sciences (Limpert et al, 2001). 

Suppose that the time series {yt} is generated from a log-normal distribution, with density 



1 f (log yt - A; 



\2 



p{yt\Xt) = ^^^-^ exp 1^ — J , yt>0; -oo < A* < oo; V > 0, 

where logyt\Xt ~ N{Xt,V). We will write yt\Xt ~ LogN{Xt,V). This distribution is of the 

form of (P, with z{yt) = logyt, a{(pt) = (pt^ = V, 7t = Xt, bi'jt) = 7?/2 and c{yt,(t)t) = 

(27ry)-V2y-iexp(-(logyOV(2^)). 
From the normal part we can see 

E(logyj At) = — = Xt 

ajt 

and from the log-normal part we can see 

E{yt\Xt)=exp{Xt + V/2) = fit 
from the latter of which the logarithmic link can be suggested, i.e. r]t = log fit = A^ -|- V/2. 
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From the normal distribution of log yt, it follows that the prior distribution of At|y* ^ is 

\St St 

and the posterior distribution of At|y* is 

Vt + y^Mogyt 1 



St + ' St + 



where rt and st are calculated as in the normal case, i.e. rt = ft/ (It and st = 1/qt- With the 
definitions of rt {£) and st {(.) , we have that the £-step forecast distribution of yt+£ is 



stii)' stii) 



The forecast mean of yt+e is 



(f^ W( I ^ I \ (V\ f 2ft{£) + qt{i) + V 

where ft{i) and qt{^) are the respective mean and variance of tjt+i, given information y*. 
Considering power discounting, the updating of rt and st is 

rt+i=S^{rt + V-Hogyt) and st+i = S\st + V~^). 

Adopting the random walk evolution for r]t = 9t = Ot-i + wj, the distribution of Xt\Xt-i is 
normal, i.e. Xt\Xt^i ~ A^(At_i, O), where i7 is the variance of ujt. Prom ([7]) the log-likelihood 
function is obtained as 

e{Xi,...,XT;y^) = Yl (^(2Atlogyi - A?) - log V4^V^^ - logy* 

jiogyt? , {Xt-Xt^ir 

log 

Bayes factors can be calculated from ([8]) and the log-normal predictive density As 
an example, consider the comparison of two models Mi and M2, which differ in the variances 
Vi and V2, respectively. Then, by denoting ri(, sit-, r2t and S2f, the values of r^, s^, for Aij 
[j = 1,2), we can express the logarithm of the Bayes factor Ht{l) as 

, 1, V2 + S2,t+l~^ (logyt+i -r2,t+iS2]+i)^ {\ogyt+i - ri^t+iSi}+if 
logHt(l) = - log 1 ^ ^ 1 ^ . 

By comparing \ogHt{l) to 0, we can conclude preference of Aii or Ai2, i-e. if logi7((l) > 
we favour A^i, if logi7t(l) < we favour A4i, while if logHt{l) = the two models are 
equivalent, in the sense that they both produce the same one-step forecast distributions. 

To illustrate the above DGLM for log-normal data we consider production data, consisting 
of 30 consecutive values of value of a product; these data are reported in Morrison (1958). 
A simple histogram shows that these data are positively skewed and it can be argued that 
the data exhibit local level time series dependence. Morrison (1958) show that modelling 
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Table 1: Mean square error (MSE) and Log-likelihood function for several values of the 
discount factor 5 for the log-normal data. 



s 


0.2 


0.3 


0.4 


0.5 


0.6 


0.7 


0.8 


0.9 


0.99 


MSE 


103.75 


13.34 


3.16 


2.22 


2.72 


3.37 


3.93 


4.34 


4.57 




-35.26 


-35.28 


-35.34 


-35.44 


-35.61 


-35.86 


-36.2 


-36.60 


-36.93 




1 I I I I I 

O 5 10 15 20 25 30 

time 



Figure 4: Log-normal data (solid line) and one-step forecasts (dotted line) for 5 = 0.5. 

these data with the normal distribution can lead to inappropriate control. Here we use the 
power discounting approach to update rj and st] Tabled] shows the mean square forecast error 
(MSE) and the value of the log-likelihood function evaluated at the posterior mean E{Xt\y^) 
for a range of values of 5. The result is that 5 = 0.5 produces the smallest MSE, while the 
likelihood function does not change dramatically. Figure H] plots the one-step forecasts for 
5 = 0.5 against the actual data. Although the extreme value y29 = 9.48 is poorly predicted, 
we conclude that the overall forecast performance of this model is good, especially given the 
short length of this time series. 

3.2.3 Gamma 

The gamma distribution (Johnson et al, 1994) is perhaps one of the most used continuous 
distributions, as it can serve as a model for the variance or precision of a population or 
experiment. In particular in Bayesian inference it is a very popular choice as the conjugate 



16 



prior for the inverse of the variance of a linear conditionally Gaussian model (see also the 
discussion of the normal distribution above) . 

Suppose that {yt} is a time series generated from a gamma distribution, with density 

p{yt\at,Pt) = —^Vt'-^expi-Ptyt), yt > 0; at,Pt > 0. 
r(at) 

This distribution is referred to as yt\at,Pt ~ G{(^t-,Pt)- Our interest is focused on f3t and so 
we will assume that at is known a priori. Thus we write p{yt\at, (it) = p{yt\Pt)- 

The above gamma distribution is of the form of ([1]), with z{yt) = yt, o-{(t>t) = 4>t = 
-it = -(it, b{jt) = - log((-7t)-7r(Qt)) and c{yt,^t) = yT~^ ■ 

It follows that 

= ^ = f = > 

and 

Var(,,|/?,) = = ^. 

The prior and posterior distributions of (it are gamma, i.e. (it\y^~^ ~ G[atSt + and 

~ G[atst + at + 1, n + yt). 
Since [it > 0, the logarithmic link is a appropriate, i.e. g{fJ-t) = log/it = Vt = F'9t- Then 
rt and are defined in a similar way as in the Poisson case, i.e. 

exp(-/t) 1 - 

Tf = and St = , 

qt atqt 

where atSt + 1 > 0. The posterior moments of log/xj are given by 

di({x) 



ft = '^{atst + yt + l)- log(n + 1) and 



which can be approximated, as in the Poisson case, by 



dx 



x=atst+yt+l 



ft ~log + \ — TT and 



rt + l 2{atst + yt + l) 2{atSt+yt + iy 

With the definition of rt{i) and st{t), the ^-step forecast distribution is 

P{yt+t\y)- T{rt{i))T{at+^) ^*+^ + yt+i) 

The mean and variance of this distribution can be obtained by conditional expectations, i.e 



yt{^)=nyt+i\y') = nnyt+i\(it+i)\y'' - '''^^^ 



st{i) 
and 

\aT{yt+i\y') = E(Var{yt+i\(it+i)\y') + \av{E{yt+i\Pt+i)\y') + 



st{i)Hat+M^) - 1) 
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The power discounting yields 



6atst + 6at 

n+i = d{rt + yt) and st+i = 



at+i 



From the logarithmic link function we have f3t = on/ exp{r]t) and if we consider a random 
walk evolution for rjt = 9f = 0f„i + Wj, we obtain the evolution of Pt as 

a _ OLtl3t-i 
Pt 



which together with the normal distribution of ~ A^(0, O), results to the distribution 

1 / (log Pt - ata^\ log A-i)^ \ 



p{Pt\Pt-i) = ^==— exp 



V2^Pt \ 20 y ' 

which is the log-normal distribution Pt\Pt-i ~ LogN{ata^\ \ogPt~i, Note that the above 
expressions can be simplified when at = a \s time- invariant. Model comparison and model 
monitoring can be conducted by considering the Bayes factors, which can be computed from 
dH]) and the predictive density piyt+tW^)- 

Bayes factors can be computed using ([8]) and the predictive distribution p{yt+i\y*') . Here 
we give two examples, both of which are using the power discounting approach. In the first 
we consider two competing models Mi and A^2! which differ in the discount factors 5i and 
(52, respectively, but otherwise they have the same structure. Then, if we denote and su 
the values of rj and st for model Aii {i = 1, 2), then the Bayes factor Ht{'\.) can be expressed 
as 

'•°^^l+^r(asi,f+i +a+ l)(ri,4+i + yi+i)-(°^i.'+i+°+i)r(r2,t+i) 



Htil) 



"°j^l+T(as2,i+i + a + l)(r2,t+i + yt+i)-(°'^2.*+i+a+i)r(ri,t+i) ' 



where, for simplicity we assume that at = a \s invariant over time and known. 

In the second example we consider a fixed discount factor 6i = 82 = 8, but now the two 
models Mi and M2 differ in the values of a, namely ai and 02- Then we can see that 
n = rit = S{rt-i + yt-^i) and st = su = {6asi^t-i + Sa)/a = 6st-i + 6, since n and st do not 
depend on Oj (note that this would not be the case if Qj were time- varying) . Then the Bayes 
factor of A^i against M2 can be expressed as 

_ /^+iK-°2) "i-"2.„^^ I ,,A(st^i+i)(»2-ai) r(a2)r(ai^f+i + ai + 1) 

r(ai)r(a2St+2 + "2 + 1) 

Thus, by comparing -fff(l) with 1, we have a means for choosing the parameter a. 

To illustrate the gamma distribution we give an example from finance. Suppose that 
yt represents the continually compound return, known also as log-return, of the price of an 
asset, defined as yt = log pt — logpt^i, where pt is the price of the asset at time t = 1, . . . ,T. 
In volatility modelling, one wishes to estimate the conditional variance of yt- This plays 
an important role in risk management and in investment strategies (Chong, 2004), as it 
quantifies the uncertainty around assets. A classical model is the generalized autoregressive 
heteroscedastic (GARCH), which assumes that given at, yt follows a normal distribution, i.e. 
ytlo't ~ A^(0, CTj ) and then it specifies the evolution of as a linear function of past values of 
cjf and yt - GARCH models are discussed in detail in Tsay (2002). 
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Table 2: Comparison of the gamma model with ARCH and GARCH models. Shown are the 
log-likelihood functions of the models, using the IBM data. 



model 


gamma ARCH(l) 


ARCH(2) 


ARCH(3) 


ARCH(4) 




-241.07 -2133.79 


-2123.10 


-2115.11 


-2110.93 


model 


GARCH(1,1) 


GARCH(1,2) 


GARCH(2,1) 


GARCH(2,2) 


£{.) 


-2109.33 


-2125.05 


-2130.86 


-2123.74 



From yt\(7t ~ -^(0, we can see that, given at, Ut/f^t follows a chi-square distribution 
with 1 degree of freedom or a G(l/2, 1/2). Thus y'^\at ~ G(l/2, 1/2) = G{l/2,l/{2a'^)). 
Then by defining at = 1/2 and (3t = 1/(2(T^), we have that yt\(3t ~ G{l/2,(3t) and so we can 
apply the above inference of the gamma response. Assuming a random walk evolution for 
rjt = 6t, we have 

exp(u;j) 

where uJt is defined above. 

We note that from power discounting we have rt = 6rt-i + ^vl-i = Z^i=i ^^Ut-i 
St = 6st-i + 6 = Yll=i — "^(1 ~ ~ Thus the one-step forecast mean of is 

st_i(l) St (>{^-(>)~( 

From the prior of Ptlv^"^, we can see that l/dj ~ G{{st + 3)/2,rt/2) and so cr^\y*^^ 
follows an inverted gamma distribution, i.e. (Jtly*^^ ~ IG{{st + 3)/2,rt/2). Similarly, we can 
see that the posterior distribution of l/af and at are 1/crf |y* ~ Giist + 3)/2, {rt + yt)/2) and 
~ IG{{st + 2>) /2, {rt +yt)/2), respectively. From these distributions we can easily report 
means, variances and quantiles, as required. 

We consider log returns from IBM stock prices over a period of 74 years. These data, 
which are described in Tsay (2002, Chapter 9), are plotted in Figure [5l Figure [6] shows the 
posterior estimate of the volatility at = K{at\y^)- We can see that the volatile periods are 
captured well, e.g. the first 120 observations in both figures indicate the high volatility. The 
model performance can be assessed by looking at the log-likelihood function of /3t = l/(2<7t ), 
evaluated at the posterior mean a^. The log-likelihood is 



t ' 

T 



£(/?!, . . . , y^) = - ^ \0g{2n^^) - log vl " ^ Y.^\og A - log (5t-l, 



2 



t=l t=l 



where is the variance of cot (the innovation of the random walk evolution of r/t = 9t). Here 
T = 888 and with 5 = 0.6 and O = 100, we compare this model with several ARCH/GARCH 
models. Table [2] shows the log-likelihood function of our model compared with those of the 
ARCH/GARCH. We see that our model outperforms the ARCH/GARCH producing much 
larger values of the log-likelihood function. 

Inference and forecasting for the inverse or inverted gamma model is very similar with the 
gamma model. For example suppose that given at and Pt, the response yt follows the inverse 
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Figure 5: Log-returns of IBM stock prices. 
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gamma distribution yt ~ IG{at, l3t), so that 



pat ^ / Pt\ 

r(at)y"*+' V ytj 

Given at (as in the gamma case), the above inverse gamma distribution is of the form of ([I]), 
with z{yt) = 1/yt, a(0t) = 0^ = 1, 7^ = -Pt, b{jt) = - log((-7t)°Vr(at)) and c{yt,<j)t) = 
yt The prior distribution for Pt is the gamma Pt\y^~^ ~ ^(0^54 + 1, rj) and the posterior 

distribution is the gamma ~ G{atSt + l,rt + yt^)- Thus the above prior is the same as 
in the gamma model and the posterior changes shghtly. As a result inference and forecasting 
for the inverse gamma follows readily from the gamma distribution. 

3.2.4 WeibuU and exponential 

The exponential and the Weibull distributions can be used in survival analysis, for example, 
in medicine, to estimate the survival of patients, or in reliability, to estimate failure times of 
say a manufacturing product. The exponential distribution is a special case of the Weibull 
and for a discussion of both, the reader is referred to Johnson et al. (1994). 

Suppose that the time series {yt} is generated by a Weibull distribution, with density 
function 

p{yt\\t) = Y/t'"^ exp (-^) , > 0; \u yt > o. 

Here we assume that ut is known and we note that for z^^ = 1 we obtain the exponential 
distribution with parameter 1/Xt- The above distribution is of the form of ([T|), with z{yt) = 
yT^ a{4>t) = = 1, 7t = -1/At, 6(7t) = - log{-ut'yt) and c{yt,<j)t) = yT~^ ■ 
Given Af, the expectation and variance of y^' are 

db{it) , 



nyT\h) 

and 

Var(y/|At) = = ^t- 

Since Xt = fJ.t > 0, the logarithmic link g{Xt) = log Aj = r]t can be used. 

The prior and posterior distributions of At are inverted gamma, i.e. At|y*~^ ~ IG{st — l,rt) 
and Atly* ~ IG{st,rt + y'^') so that l/Xt\y^-^ ~ G{st-l,rt) and l/Xt\y^ ~ G(sj,rt + yf ), e.g. 



^^'*'^"^ = f(^Af^^n-A; 

Since the link is logarithmic and the prior /posterior distributions are inverted gamma, by 
writing log Xt = — log A^^, the approximation of rt and st follow from a similar way as in the 
Poisson, i.e. 

exp(/t) 1 + qt 

rt = and St = 

qt qt 

and the posterior moments of log At are given by 

/; = ^(st + yr-l)-log(n + l) and q*t = ^^ 
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which can be approximated by 

ft - log — + — and Qt w — -. 

n + l 2[st + yt'-l) 2(st + ?/j*-l) 

With the usual definition of rt{i) and st{l) and their calculation via ft{i), Qt{^) and the 
above equation, we obtain the ^-step forecast distribution of yt+e as 

rt{iy'(-^^-^y^l+/~\st{£) - 1) 

piy^^^iy ) = — irm+yZrr^^'^ — ' ^''^ 

Using conditional expectations, we can obtain the forecast mean and variance of y^!^^/ as 
for st{t) > 2 and 

Var(yr;ri2/*) = E(Var(2/r-/|A,+,)|y*) + Var(E(2;rr/|A,+,)|y*) = 3^ , 

for st{e) > 3. 

Considering a random walk evolution for i]t = 9t = Ot-i + oJt, from the logarithmic link, 
we obtain 

At = exp(u;t)At„i (14) 

and so At|At_i ~ Log N {log Xt-i,^^), where ft is the variance of fl. The derivation of this 
result is the same as in the Poisson example. 

From ([7|) and Ai|Ai_i ~ LogN{log Xt-i, the log-likelihood function of Ai, . . . , Xt, based 
on data = {yi, . . . , yr} is 

... . T, \-(yt', A, , log(2^11) , {\ogXt-\ogXt-if \ 
i{Xr,...,XT;y ) = - ^ (^^ + log - + (1 - logy* + + ^ )■ 

Power discounting yields 

n+i = 6{rt + yj") and st+i = 6{st + 1). 

We consider model comparison for the Weibull distribution when r]t = FOt and 9t = 
9t-i + u!t, for some scalar F. This is an autoregressive type evolution for ijt. We specify the 
variance of ivt with a discount factor (West and Harrison, 1997, Chapter 6) as Yar{ujt) = fit = 
(1 — 5)Pt~i/5, where Pt is the posterior variance of 0t|y*. The density of yt\y^~^ is given by 
([H]), for ^ = 1, rt_i(l) =rt = exp{ft)/qt and St-i(l) = St = (1 + qt)/qt, where ft = Fmt-i, 
qt = F'^Pt-i/5 and mt, Pt are updated from ([5]) as 

1 st + yt-i , 1 

mt = log — h 



n + 1 2{st + yt-l) 

and 

„ Pt^i P?-i 2st + 2yt - 3 ^ 1 2st + 2yt - 3 

-T* — — ; — 



5^ \ 2{st + yt-l)qt) qt 2{st + yt-l)F^' 
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Table 3: Log-likelihood function and mean H{1) of the Bayes factor sequence {Ht{l)} of 
Ml (with di) against M2 (with 62)- 





«(•) 








H{1) 








Ol \02 




0.99 


n Q 


u.o 


n 7 


u.u 


n ^ 

u.o 


0.99 


-5.787 




1 


0.997 


0.995 


0.994 


0.994 


0.998 


0.95 


-7.411 


1 


001 


0.999 


0.997 


0.995 


0.995 


0.999 


0.90 


-3.123 


1 


002 


1 


0.998 


0.996 


0.996 


1 


0.85 


-8.547 


1 


004 


1.001 


0.999 


0.997 


0.997 


1.001 


0.80 


-8.854 


1 


005 


1.002 


1 


0.998 


0.998 


1.002 


0.75 


-9.098 


1 


006 


1.003 


1.001 


0.999 


0.999 


1.002 


0.70 


-9.301 


1 


007 


1.004 


1.002 


1 


0.999 


1.003 


0.65 


-9.476 


1 


008 


1.005 


1.002 


1 


1 


1.003 


0.6 


-9.631 


1 


008 


1.005 


1.003 


1.001 


1 


1.003 


0.55 


-9.771 


1 


008 


1.005 


1.003 


1 


0.999 


1.002 


0.50 


-9.947 


1 


007 


1.004 


1.001 


0.998 


0.997 


1 



We consider now the situation of the choice of 6. Suppose we have two models A4i with a 
discount factor 61 and M2 with 82 and otherwise the models are the same. The Bayes factor 
from a single observation (k = 1) is given by 

^ ^ rir\su~i){r2t+yrr' 

r',r\s2t-l){ru+yn''^' 

where rjt and sjt are defined as rt and sj if we replace 5 by 6j, for j = 1, 2. 

For illustration, we simulate 500 observations from a WeibuU distribution with ft = 3 and 
{At} being simulated from (jl4p . where we have used F = 1, Xq = 1 and LVt ~ N{0, 1). Figure[7| 
shows the simulated data. In order to choose the discount factor 5, we apply the Bayes factor 
Ht{l) over a range values of 5i, (52 ^ 0.5. We have used rriQ = and a weakly informative prior 
Pq = 1000. Table [3] reports on H{1), the mean of Ht{l), and on the log-likelihood function 
i{Xi, . . . , X^oolu^^^) evaluated at At = (rt + y'^^)/st (see the posterior distribution of At|y*). 
This table indicates that there is little difference in the performance of the one-step forecast 
distribution, under the two models. The log-likelihood function clearly indicates that 5i = 0.9 
produces the model with the largest likelihood. The deficiency to separate the models using 
the Bayes factor criterion, indicates that, in a sequential setting which is appropriate for time 
series, one should better look at the Bayes factor for each time t and not at the overall mean 
of the Bayes factor. Figure [8] shows the Bayes factor of Ali (with 61 = 0.9) against M.2 
(with 62 = 0.7). We see that, although the mean of the Bayes factor is 0.996 (see Table [3]), 
at t = 1 — 50 and t = 100 — 200, there can be declared significant difference between the 
two models, which is slightly in favour of model A4i. This effect is masked when one looks 
at the overall picture, considering the mean H{1), and it indicates the benefit of sequential 
application of Bayes factors. 

The exponential and Weibull distributions are useful models for the analysis of survival 
times data. In the context of DGLMs, we have dynamic survival models due to Gamerman 
(1991). Here we give a brief description of dynamic survival models and we extend a result 
of Gamerman (1991). 
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time 



Figure 7: Simulated data from a Weibull distribution with f( = 3 and At generated from ([T 



Suppose that, given vt and At, the survival time yt follows the Weibull distribution p(yt| At) 
(here we assume that vt is known and so we exclude it from conditioning). For example, if 
the exponential distribution is believed to be an appropriate model, we have ft = 1- The 
survivor function of the Weibull distribution is 



^(ytlAt 



At 



exp 



yt 



At 



dut = exp 



At 



(15) 



Xp\ and we 



Suppose we have a vector of p regressor variables or covariates x = [xi ■ 
consider a vector of parameters (3 so that 1/At is proportional to exp(x'/?). Then the hazard 
function h{yt;i't, ^t) = h{t) oc vtyt^~ exp(xj/?) and this leads to the proportional hazards 
model with h{t) = /io(i) exp(x'/?), where /jo(i) is the baseline hazard function (Dobson, 2002, 
§10.2). So one can write \ogh[t) = log/io(t) + x' (3 and considering a partition of (0, A^) as 
^ = Vq < Vi < ■ ■ ■ < Vt = N so that t £ It = iyt-i,yt], we write log/io(t) = at, i.e. the 
baseline is a step function that takes a constant value at at each time interval It- 

Now in the DGLM flavor, dynamic survival models assume that P evolves over time 
between intervals Ii,...,It, but it remains constant inside each interval If Gamerman 
(1991) considers the model 



logAp^ = log h^^\t) = F'e 



1, 



1 T 



(16) 



where Fj = [I x'j]' is the design vector and 9t = [at PtY is the time-varying parameter vector, 



which is assumed to follow a random walk evolution according to 



^t-i + i^t, and At has 

been modified to X).'" to account for individual j. Here, t indexes the T intervals Ii, . . . ,It of 
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Figure 8: Bayes factor {Ht{l)} of model Mi with 6 = 0.9 vs model M2 with 62 = 0.7. 



(0, A^) and j indexes each individual to be alive at the beginning of It, where it is the number 
of such individuals in It. Note that through Xj, each individual j may have different effects 
through different regressor variables, although it is not unrealistic to set xj = x or F = [1 x']' 
(for all individuals we have the same regressor variables). The dynamics of the system is 
reflected on the dynamics of Ot- Equations (jlSp and (|16p define a dynamic survival model, 
which Bayesian inference follows, in an obvious extension of the DGLM estimation, providing 

the posterior first two moments of h^^\t) (details appear in Gamerman, 1991). 

(7) 

Fix individual j and write = Aj. Given the adopted random walk evolution for Ot, 
for any yl G It = {yt-i,yt], the prior X^^\y^~^ ~ G{st — l,rt) combines with the survivor 
function (fTSj) to give the survivor prediction 



S{y:\y'-') = / S{{y*t-yt-i)\Xt)p{Xt'\y'-')d\7' 
Jo 

r(si - 1) 

(yt-yt-iry^'^-'^ 



/•oo 

/ A7(^-^) exp(-((y; - yt^ir + rt)X7') A"' 
Jo 



1 + 



n 



where we can see that for ut = 1, we obtain the survivor prediction of the exponential 
distribution, reported in Gamerman (1991). Thus 5(yj*|y*~^) predicts the remaining survival 
time of individual j still alive. 
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3.2.5 Pareto and beta 



The Pareto (Johnson et al, 1994) is a skewed distribution with many apphcations in social, 
scientific and geophysical phenomena. For example, in economics it can describe the allocation 
of wealth among individuals or prices of the returns of stocks. 

Suppose that the time series {yt} is generated from Pareto distribution with density 

piyt\Xt) = hy;^'~\ yt > i; A* > 0. 

This distribution is also known as Pareto(I) distribution and is known as the index of 
inequality (this distribution is examined in detail in Johnson et al., 1994). The above distri- 
bution is of the form of with z{yt) = logyt, a{(f)t) = = 1, 7t = -Xt, h{'^t) = - log(-7i) 
and c{yt,(j)t) = We note that by setting xt = l/yt or xt = 1/(1 — yt), we have that 

< < 1 so that, given Xt, xt follows a beta distribution with parameters At,l and l,Af, 
respectively. Thus inference for the Pareto distribution can be readily applied to the beta 
distribution (Johnson et al., 1994) when at least one parameter of the beta distribution is 
equal to 1. This is a useful consideration as we can deal with responses being proportions or 
probabilities. 
We have 

lE(ytlAt) = = {Xt>l) and \^T{yt\Xt) = t-^^t ^ (A* > 2). 

At - 1 {Xt-lY{Xt-2) 

Since nt > 0, the logarithmic link function can be used, so that g[^t) = log ^it = log A^ — 
log(A( — 1), for A( > 1. Using the transformation 7^ = —Xt, we find that the prior and posterior 
distributions of are gamma, i.e. At|y*~^ G[st + l,Tt) and At|y* ~ G{st + 2,rt + logyt), 
respectively. 

Following the approximation of and st in the Poisson case, we have that 

exp(-/t) l-qt 

rt = and St = 

qt qt 

and the posterior moments of log A^ are given by 

dip{x) 



ft = '^{st + log yt + 1) - log(rj + 1) and 



dx 

which can be approximated by 



x=st+log yt+1 



st + logyt + l , 1 , , 2st + 2logyt + l 

tf K, log and q, = — . 

* n + l 2{st+\ogyt+l) ^* 2{st + \ogyt + l) 

Power discounting yields 

rt+i = 5{rt + \ogyt) and st+i = 5{st + I). 

With rt{l) and st{tj computed from ft{tj and qt[tj and the above equations of rt and st, 
the £-step forecast distribution of ytJ^i is 

, .t. rt(^)^'W+n^*(^) + l) 
P[yt+e\y ) - 



yt+e{rt{l) + log yt+e)'*(^y 
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Considering a random walk evolution for rjt = 6t = 6t-i + wt, we have that the evolution 
of \t is 

^ _ At-i exp(u;t) 
* At„i exp(cjt) - At_i + 1' 

from which we can obtain the distribution of \t\\t-i- With this, assuming that uJt ~ A^(0, 
and that At > 1, the density of At|At_i is 

(\\\ \ 1 ( 1 A At(At„i 

p(At|At_i) = — ^ — exp -— log ■ 



V2^At(At-l) I 20 V At_i(A: 




where Vt should be chosen so that to guarantee At > 1, for all t. Then from ([7]) the log- 
likelihood function is 



^(Ai, . . . , Ar;y'^) = ^ ( - At logyt + log At - logyt 
t=i ^ 

-log ./SOMA. (log M^i^)^), 

for Ai, . . . , At > 1. 

Bayes factors can be computed from the predictive density p(yt+i|y*) and dS]). As an 
example consider the comparison of two models TWi and A^2) which differ in some quantitative 
aspects, e.g. in the discount factor 5 (see also the illustration that follows). By defining rjt 
and Sjt the respective values of rt and st, for model Mj {j = 1,2), the Bayes factor Ht{l) 
can be expressed as 



\tTl^\si,t+i + l)(r2,t+i + logyt+i)^^''+^+^ 
'■2't+i'^'(^2,t+i + l)(ri,t+i + logyt+i)^^''+i+^ ■ 



To illustrate the above Pareto model for time series data, we consider the data of Arnold 
and Press (1989), consisting of 30 wage observations (in multiples of US dollars) of production- 
line workers in a large industrial firm; the data are also discussed in Dyer (1981). The data 
are shown in Figure [H from which two points can be argued it: (a) the data appear to be 
autocorrelated (in fact it is easy to run a corrolagram to justify this) and (b) the data exhibit 
a local level behaviour (one could argue for local stationarity, but with only 30 observations 
a local level model seems more appropriate). Here we apply the Pareto model with rt and 
St being updated by the power discounting (this is appropriate for the local level behaviour 
of the time series) . Table H] shows the mean of the Bayes factors for various values of the 
discount factors 6i and 62 in the range of [0.5,0.99]. It is evident that the best model is the 
model with 5 = 0.99, which is capable of producing Bayes factors larger than 1 as compared 
with models with lower discount factors. From that table it is also evident that models with 
low discount factors do worse than models with high discount factors and so by far the worst 
model is that using 5 = 0.5. Figure [TO] shows the values of the Bayes factor of the model with 
5 = 0.99 against the model with 5 = 0.95; we note that all values of the Bayes factor are 
larger than one and there is a steady increase in the Bayes factors indicating the superiority 
of the model with 6 = 0.99. 
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Figure 9: Annual wage Pareto data. 



3.2.6 Inverse Gaussian 

The inverse Gaussian or Wald (Chhikara and Folks, 1989; Johnson et al, 1994) is a skewed 
distribution that can describe phenomena in economics and in many other sciences. This 
distribution is known as the first passage time distribution of Brownian motion with positive 
drift. Recently, Huberman et al. (1998) used an inverse Gaussian distribution to model 
internet flow and internet traffic. 

Suppose that the time series {yt} is generated from an inverse Gaussian distribution, that 
is for given fit and At, the density function of yt is 



^(yil^t, Aj) = ^/-^exp ( ) , yt>0; fitAtyO. 



This is a unimodal distribution, which converges to the normal distribution, as \t — > oo. To 
the following we will assume that is a known parameter and interest will be placed on /i^; 
hence we write p{yt\lJ't, Aj) = p{yt\iJ,t)- We can see that the above distribution is of the form 
of (dl), with z{yt) = yt, (pt = Xt, a{4>t) = 2/Xt, -ft = -1//^*, &(7t) = -2M = -2^/^ and 
c{ytAt) = (At/(27r?/^))^/^ exp(-At/(2yt)). Then we can verify that 

«7t V-7t 



and 

d%{^t) _ a{<pt) _ f4 



Var(2/t|/it) = a{^tj ^ ^ 
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Table 4: Mean H{1) of the Bayes factor sequence {Ht{l)} oi Mi (with 6i) against M2 (with 
62) for the Pareto model. 









H{1) 










n QQ 




n 8 

u.o 


n 7 


u.u 




0.99 


1 


1.950 


3.484 


5.414 


7.786 


10.798 


0.95 


0.749 


1.401 


2.449 


3.774 


5.409 


7.489 


0.90 


0.559 


1 


1.708 


2.608 


3.721 


5.141 


0.85 


0.439 


0.760 


1.276 


1.931 


2.745 


3.785 


0.80 


0.358 


0.605 


1 


1.503 


2.129 


2.931 


0.75 


0.299 


0.496 


0.810 


1.211 


1.711 


2.350 


0.70 


0.254 


0.415 


0.672 


1 


1.408 


1.932 


0.65 


0.218 


0.352 


0.566 


0.839 


1.179 


1.616 


0.60 


0.189 


0.302 


0.482 


0.712 


1 


1.368 


0.55 


0.164 


0.261 


0.414 


0.609 


0.854 


1.167 


0.50 


0.143 


0.225 


0.356 


0.523 


0.732 


1 



The canonical link maps fit to 7^, or g{fit) = It = > but this is not convenient, since 

g{fjit) < and hence we need to find an appropriate definition of F and G in the state space 
representation of g{iJtt) = Vt iii order to guarantee — 00 < rjt < 00. The logarithmic link, 
gint) = ^og fit, seems to work better, since it maps fit to the real line and so F'Ot = rft = gifJ-t) 
is defined easily. 

The prior distribution of fit can be defined via the prior distribution of jt and the trans- 
formation 7t = —l/fit- In the appendix it is shown that 

My^-') = ^^"P(^^Vn)r. ( Jrt-f.tst? \ 

{exp{sf/rt)st^/^^/rt + l)fit V nf^t J 

In the appendix it is shown that 

nf^tly'-') = ^^"P^^/^^ . (18) 

ex.p{sf/rt)st^y^T/rt + 1 
The posterior distribution of fit is obtained from the posterior distribution of jt as 

p[fit\y) = K.[rt + Xtyt, St + At) exv \ o ^ ]^ 

\ K f^t J fit 

2exp((st + At)V(rt + XtVtWt + XtVt) 

(exp((st + XtY/in + \tyt)){st + Xt)V^/in + kyt) + l)^*? 

, irt + Xtyt- fJ'tist + Xt)f^ 
X exp ' 



{rt + hyt)fJ'i 
where in the appendix it is shown that 



n{rt,st)=rt ( exp 
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Figure 10: Bayes factor {Ht{l)} of model A^i with 6 = 0.99 vs model A^2 with 62 = 0.95 for 
the Pareto data. 



The approximation of rt and st is difficult, since the moment generating function of rjt = log fit 
(which is needed in order to compute rt and st) is not available in close form. Thus power 
discounting should be applied. From the posterior of 7t|?/*, given by (jl]), we have 

{pi-ft\y'))' oc exp (^6 (^rt + ^) 7t + 26 (^st + ^) V^t 

and so from the prior of 7^+1 (equation ^) and the power discounting law we obtain 

5irtXt + 2yt) , 5{stXt + 2) 

n+i = ^ and st+i = 



Xt " Xt 

With rt{£) = rt+i and st{i) = st+i, the £-step forecast distribution of yt+i\y^ is 



P(.yt+£\y^) = c{rt+i + 2yt+i) ^ 



1 



: exp 



A 



t+e 



2yt+£ 



st+iXt+e + 2 
Xt+e 



X exp 



/ {st+iXt+i + 2)"' 



Xt+eTT 



\Xt+i{rt+iXt+e + 2yt+i)) y n+iXt+i + 2yt+e, 
where the normalizing constant c is 



1 S(+i exp 



n+ij V ^t+i 



TT 



+ 1 



30 



The i-step forecast mean can be deduced by ([T8|) as 



^iyt+i\y ) = E(E(yi+£|//t+^)|y ) = ¥.{fit+e\y ^ - 



Of course the above power discounting specifies rt and st, for a random walk type evolution 
for the prior p!7|) . Following this, we can specify log /it = rjt = 9t = 9t-i + wt, with cut ~ 
iV(0,r2), and so 

/it = /tt-i exp(u;t), 

which leads to the density 

1 (log/it -log/it~i^^ 

p(/it|/if-i) = ^— — exp 



V2^/if V 217 
Therefore, using ([7]), the log-likelihood function is 

£(//!,..., /iT;y^) = ^ --^(2/if-yt)+lo: 




Bayes factors can be easily computed from p(yt+i|y*) and the Bayes factor formula ([5]). 

To illustrate the inverse Gaussian distribution we consider data consisting of 30 daily ob- 
servations of toluene exposure concentrations (TEC) for a single worker doing stain removing. 
The data can be found in Takagi et al. (1997) who propose a simple model fit using maximum 
likelihood estimation for the inverse Gaussian distribution. However, it may be argued that 
these data are auto correlated and so an appropriate time series should be fitted. Figure [11] 
shows one-step forecasts means against the TEC data. The forecast means are computed 
using the above DGLM model for the inverse Gaussian response, using At = A. The results 
show that a low value of the discount factor 6 = 0.5 and a low value of A = 0.01 yield the best 
forecasts. The posterior mean E(/it|y*) is plotted in Figure [T2l from which we can clearly see 
that there is a time-varying feature of the parameters of the inverse Gaussian distribution. 
This is failed to be recognized in Takagi et al. (1997). These authors propose estimates for 
the mean and the scale of the inverse Gaussian distribution as 16.7 and 6.4, which are both 
larger than the mean of the posterior means (E(/ii|y"^) -|- • • • -|- E(/i3o|y^'^))/30 = 14.48 and 
A = 0.01. We note that from Figure [11] as A increases, the forecast performance deteriorates 
so that a value of A near 6.4 would yield poor forecast accuracy. The model we propose here 
exploits the dynamic behaviour of /tt and it is an appropriate model for forecasting. 



4 Concluding comments 

In this paper we discuss approximate Bayesian inference of dynamic generalized linear mod- 
els (DGLMs), following West et al. (1985) and co-authors. Such an approach allows the 
derivation of the multi-step forecast distribution, which is a useful consideration for carrying 
out error analysis based on residuals, on the likelihood function, or on Bayes factors. We 
explore all the above issues by examining in detail several examples of distributions including 
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(a) sensitivity of lambda 



(b) sensitivity of delta 




Figure 11: One-step forecast mean for the TEC data; panel (a) shows the actual data (solid 
line), the one-step forecasts with 6 = 0.5 and A = 0.01 (dashed line), and the one-step 
forecasts with 6 = 0.5 and A = 1 (dotted line); panel (b) shows the actual data (solid line), 
the one-step forecasts with d = 0.5 and A = 0.01 (dashed line), and the one-step forecasts 
with (5 = 0. 9 and A = 0. 01 (dotted line). 



binomial, Poisson, negative binomial, geometric, normal, log-normal, gamma, exponential, 
Weibull, Pareto, two special cases of the beta, and inverse Gaussian. 

We believe that DGLMs offer a unique statistical framework for dealing with a range 
of statistical problems, including business and finance, medicine, biology and genetics, and 
behavioural sciences. In most of these areas, researchers are not well aware of the advantages 
that Bayesian inference for DGLMs can offer. In this context we believe that the present 
paper offers a clear description of the methods with detailed examples of many useful response 
distributions. 



Appendix 

Proof of equations ([9]) and (1111) 

First we calculate the mean and variance of the log-gamma and the log-beta distributions. 
Let X follow the gamma distribution with parameters a and /?, with density function 

p{x) = — -x""^exp(-/?x), 
r(a) 
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Figure 12: Posterior mean {E(^t|y*)} of the ETC data. 



where r(.) denotes the gamma function and a,(3 > 0. The density function of y = logX is 



p{y) 



T{a) 



exp((ay) - ^exp(y)). 



The moment generating function of Y is 

My{z) =E(exp(zy)) = 



— -exp( a + z y-/3exp(y d?/ = 
3 i [a) i [a)p^ 

and the cumulant generating function is Ky{z) = log My (z) = logr(a+z) — logr(a) — zlog/3. 
Then we have 



E(y) 



dK{z) 



dz 



^(a)— log/3 and Var(y) 



d'^K{z) 



z=0 



dz^ 



z=0 



dtp{a) 
da ' 



(A-1) 



where ■(/'(•) is the digamma function, which is defined by 'ip{x) = dlogT{x)/dx and the 
derivative is known as the trigamma function (Abramowitz and Stegun, 1964). 

For the log-beta distribution, let X follow the beta distribution, with density function 

where a, /3 > and < x < 1. The density function of y = log X is 

r(a + P) ex.p{ay) 



p{y) 



r(a)r(/3) (l + exp(y)) 



a+/3' 
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with moment generating function 



My{z) =E(exp(zy)) 



T{a)T{(3) 



exp((Q; + z)y) _ T{a + z)T{(3 - z) 
■ uy — 



(1 + exp(y))"+/3 



for z < (3. The cumulant generating function is K{z) = log My (z) = log T{a + z) + log r(/3 ■ 
z) — logr(a) — logr(/?) and so 



E{Y) 



dK{z) 



dz 



i^{a)-i){(5) and Var(y) 



d^K{z) 



z=0 



dz^ 



2=0 



da 



d/3 



For computational purposes, for large x, we can approximate 'ip{x) by logx and dip{x)/ dx 
by 1/x (Abramowitz and Stegun, 1964). 

Thus, for the calculation of rt and St in equation ([9]), from the prior 7rf|y*~^ ~ B{rt, St — rt), 
we have 



ft = nm\y 



E log 



and 



= \ai{rit\y 



Var log 



t-1 



''A(n) - V'ln - St) = log 



St - n 



_ dil){rt) dipist - n) 



1 



1 



dn 



d{st - n) n St - rt 



(A-3) 



(A-4) 



We obtain ^ by solving (|A-3p and (|A-4p for rt and Sf. 

The calculation of rt and Sf of (jlip follows a similar pattern. To this end, we note the 
gamma prior ~ G{rt,St) and with the logarithmic link we have 



ft = nvt\y 



E(logAt|y 



ip{rt) - log(st) = loj 



n 

St 



and 



= Ya.1 {r]t\y 



Var(logAt|y*-i) 



dip{rt) 



1 



(A-5) 



(A-6) 



Equation ([TT]) is obtained by the solution of ()A-5P and ()A-6P for rt and Sf. 

Since ft and are only guides of the mean and variance of the prior of rjt, the above 
approximations of il^{x) and d'ip{x)/ dx can be used even when x is small. The posterior 
quantities /* = E(?7t|y*) and ql = Var(?7t|y*) are calculated in a similar way, but here we use 
the full approximations iIj{x) = \ogx + x~^ and dip{x)/ dx = x~^{l — (2x)~^), the details of 
which can be found in Abramowitz and Stegun (1964). 

Proof of the prior ( 1171) and the expectation ( I18|) 

The prior distribution of 7t is 

Piltly^'^) = n{rt, St) exp(rt7j + 2st^^). (A-7) 

This is not a known distribution and so we need to use integration in order to find the constant 
K(rt, St)- Since 7t < 0, we need to evaluate 



I 



exp{rtjt + 2sty/^t) djt 
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By applying the substitution y = \J—'^t we have 



exp 1 -n ( y - j ydy = 2exp ) h. 



Now 1\ can be written as 

•5* / / / St \ \ , , / / / \ I St 



Ii = —J exp \-rt l^y - — j \^y + j^ I (^^ ~ ~ j \ [v - — ] dy = I2 + h- 



Integral I2 can be evaluated via the Gaussian integral, i.e 

dy 



l2 = 



^ n J \ , St 7T 



\ 



2 
2rt 



2rt V n 

For /s we use the substitution {y — st/rtY = z and so we get 



^3=0/ exp(-rt2;) = — exp 



Thus, combining /i, /2 and Is, we obtain 

„2\ rrr \ -1 



K(rt, st) = I '=rt (exp (^^) ^'^^ + ^) 



The required prior distribution of nt is immediately obtained by density (|A-7p . if we apply 
the transformation 7f = —l/nt and we use K{rt,st) as above. 
Proceeding with the proof of (fT8|) we have 



IE(/Ut|y*"^) = / fitpifJ-tly*'^) dfit = c — exp ( 2^ ) = c-f' 

) Jo fJ-t \ "^tfJ-t 



where c = (2 exp{st /rt)rt) / {exp(st /rt)st\/ n/rt + 1). To evaluate integral I we note that 
(rt — ntSt)"^ / {rtfj-t) = '''t^i't'tlh^ — stY and by applying the substitution //JT^ = — y and using 
the Gaussian integral, we have 



The required mean (fTSj) is obtained as cl. 
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